Method and apparatus for retrieval of similar heart sounds from a database

ABSTRACT

The present invention exploits a visual rendering of heart sounds and models the morphological variations of audio envelopes through a constrained non-rigid translation transform. Similar heart sounds are then retrieved by recovering the corresponding alignment transform using a variant of shape-based dynamic time warping.

BACKGROUND OF THE INVENTION

1. Field of Invention

The present invention relates generally to the field of audio pattern matching. More specifically, the present invention is related to matching heart sounds to previously diagnosed heart sounds assisting in physician decision.

2. Discussion of Related Art

Multimedia data is widely used in medical diagnosis through auditory and visual examination by physicians. In particular, heart auscultations and ECGs are two very important and commonly used diagnostic aids in cardiovascular disease diagnosis. While physicians routinely perform diagnosis by simple heart auscultation and visual examination of ECG waveform shapes, these two modalities reveal different diagnostic information about the heart. The heartbeat, for example, can reveal abnormal sounds caused by valvular disease, pericarditis, and dysrhythmia. On the other hand, the ECG unveils abnormal electrical activity during the contraction of the myocardium.

Single ECG analysis and ECG classification are well researched fields, and the popular techniques include neural network, machine learning methods, wavelet transforms and genetic algorithms. The rule-based methods rely on the accuracy of the P-Q-R-S-T segment detection. Errors in estimation of these feature values can cause major errors in disease-specific interpretation. The parametric modeling methods, on the other hand, are good at spotting major disease differences but can't take into account fine morphological variability due to heart rate (eg., ventricular vs. supra-ventricular tachycardia) and physiological differences. Related work in the time alignment of ECGs also exists.

Automatic analysis of heart sounds has been investigated for detecting heart abnormalities. The predominant approach in heart sound analysis is based on feature extraction and classification, as is conventional for audio analysis. These features can be roughly classified into two categories, the spatio-temporal features, such as the zero-crossing rate (ZCR), hidden Markov features etc., or frequency-domain features, such as Mel-frequency Cepstral Coefficients (MFCC).

U.S. Pat. Nos. 5,273,049; 7,031,765; and 6,480,737 describe matching electrocardiogram data to detect specific heart defects. These references do not match any sort of sound data. U.S. Pat. Nos. 5,218,969 and 5,025,809 process heart sound data to extract features (pitch/frequency, phase/sub-phase, etc.) from the heart sound and compare those extracted features to ranges of values which trigger rules for various diseases.

Whatever the precise merits, features, and advantages of the above cited references, none of them achieves or fulfills the purposes of the present invention.

SUMMARY OF THE INVENTION

In one embodiment, the present invention provides a computer-implemented method for detecting audio similarity of heart sounds comprising: recording a first heart sound, pre-processing the first heart sound, wherein the pre-processing comprises: selecting a time duration of a portion of the first heart sound to use as a second heart sound, constructing a line segment approximation of the second heart sound, defining an audio envelope around the second heart sound using the line segment approximation, isolating a plurality of fiducial points on the audio envelope, and matching the second heart sound to a plurality of similar heart sounds, wherein the matching comprises: identifying a non-rigid alignment transform based on a shape-based dynamic time warping to determine a correspondence between the fiducial points of the audio envelope and fiducial points from a database of heart sounds, defining a measure of shape similarity by determining a ratio of matched fiducial points to a total number of fiducial points, and ranking the matching based on the ratio.

In another embodiment, the present invention provides for a computer program product comprising: a computer usable medium having computer usable program code for detecting audio similarity of heart sounds, said computer program product including: computer usable program code for recording a first heart sound, computer usable program code for pre-processing the first heart sound, wherein the computer usable program code for pre-processing comprises: program code for selecting a time duration of a portion of the first heart sound to use as a second heart sound, computer usable program code for constructing a line segment approximation of the second heart sound, computer usable program code for defining an audio envelope around the second heart sound using the line segment approximation, computer usable program code for isolating a plurality of fiducial points on the audio envelope, and computer usable program code for matching the second heart sound to a plurality of similar heart sounds, wherein the computer usable program code matching comprises: program code for identifying a non-rigid alignment transform based on a shape-based dynamic time warping to determine a correspondence between the fiducial points of the audio envelope and fiducial points from a database of heart sounds, program code for defining a measure of shape similarity by determining a ratio of matched fiducial points to a total number of fiducial points, program code for ranking the matching based on the ratio.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a computer hardware system for implementing one embodiment of the present invention.

FIG. 2 illustrates a flowchart of the operation of an embodiment of the present invention.

FIG. 3 illustrates a line segment approximation of a recorded sound.

FIG. 4 a illustrates an audio envelope matching the line segment approximation of FIG. 3.

FIG. 4 b illustrates a prior art result of using a homomorphic filter on the line segment approximation of FIG. 3.

FIGS. 5 a and 5 b illustrate the results of the pre-processing step of periodicity detection on a recorded heart sound and a database heart sound, respectively.

FIGS. 5 c and 5 d illustrate the results of amplitude and time normalization on a recorded heart sound and a database heart sound, respectively.

FIGS. 5 e and 5 f illustrate the fiducial points on the audio envelope of a recorded heart sound and a database heart sound, respectively.

FIG. 5 g illustrates the projection of a set of query fiducial points onto the matching database curve.

FIG. 5 h illustrates the shape-based dynamic time warping alignment used in FIG. 5 g.

FIG. 6 illustrates the accounting of insertions and gaps in the computation of the match between two sound signals.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

While this invention is illustrated and described in a preferred embodiment, the invention may be produced in many different configurations. There is depicted in the drawings, and will herein be described in detail, a preferred embodiment of the invention, with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and the associated functional specifications for its construction and is not intended to limit the invention to the embodiment illustrated. Those skilled in the art will envision many other possible variations within the scope of the present invention.

The present invention records sounds for comparison with a database of potentially similar sounds. While the preferred embodiment comprises heart sounds compared with a database of heart sounds for disease and defect identification, the present invention may be used in other applications such as bird songs or human voices.

FIG. 1 shows a computer hardware system 100 used in implementing the method and program code of a preferred embodiment. Computer system 100 comprises a processor 110, memory 120, microphone 130, input/output interface 140, and database 150. Memory 120 stores sounds recorded by microphone 130. In one embodiment, the recorded sounds are heart sounds from a patient. Input/output interface 140 communicates with database 150, which stores a plurality of stored sounds. In one embodiment, the stored sounds comprise previously recorded heart sounds, which have been categorized according to diagnosis. After the similar heart sounds are identified from the database, their associated disease labels are used to form a distribution of related diseases for physician decision support.

FIG. 2 shows the overall operation of the preferred embodiment as performed by computer hardware system 100. Beginning with step 210, microphone 130 records a sound and stores it in memory 120. Processor 110 preprocesses the sound in step 220 by selecting a time period of interest, generally corresponding to one period of a periodic sound, such as a single heart beat. In step 230, processor 110 constructs a line segment approximation of the single period of the sound, as shown in FIG. 4. In step 240, processor 110 defines an audio envelope around the line segment approximation, as shown in FIG. 5. In step 250, processor 110 isolates fiducial points from the audio envelope, as shown in FIG. 6. In step 260, processor 110 matches the fiducial points from audio envelope of the recorded sound with fiducial points from the database sounds obtained through input/output interface 140.

A more detailed discussion of the pre-processing and modeling the shape variation of heart sounds performed by processor 110 will now be presented.

Feature Pre-Processing—Periodicity Detection

While normal heart sounds show good repetition, the periodicity is surprisingly difficult to spot for abnormal heart sounds where the repetitions can be nested or irregular (arrhythmias). Simple auto-correlation is often insufficient for this purpose. A robust periodicity detector treats periodicity detection as the problem of recovering a shift/translation that best aligns two self-similar curves. Consider a periodic curve g(t) with period T. Then by definition g(t)=g(t+kT) for all multiples k=0,1,2, . . . . Consider a candidate period τ. Form a curve f(t) by shifting g(t) by τ, i.e.

$\begin{matrix} {{f(t)} = \left\{ \begin{matrix} {g\left( {t - \tau} \right)} & {{{if}\mspace{14mu} t} \geq \tau} \\ {g(t)} & {otherwise} \end{matrix} \right.} & (1) \end{matrix}$

Then define a function R(τ) that records the number of curve features that can be verified to satisfy the periodicity condition based on the current estimate of the period as

$\begin{matrix} {{R(\tau)} = {{\frac{\left\{ {g\left( t_{i} \right)} \right\} }{\max \left\{ {{N - \tau},\tau} \right\}}\mspace{14mu} {such}\mspace{14mu} {that}\mspace{14mu} {{{g\left( t_{i} \right)} - {f\left( t_{j} \right)}}}} \leq ɛ}} & (2) \end{matrix}$

where N is the total number of points on the curve, t_(i) and t_(j) are time values of matching fiducial points, and ε is an error tolerance. The above function (2) can be computed in linear time in comparison to the quadratic time for the autocorrelation function. The function R(τ) shows peaks at precisely those shifts which correspond to periodic repetitions of the curve. The most likely period is then taken as the smallest τ with the most integer multiples in the allowed range of heart beats (40-180 beats/minute). There may be more than one candidate for period, particularly when the periodic repetitions are nested. The present invention's algorithm finds such overlapping repetitions and tests the dynamic time warping algorithm with each such choice.

Modeling Shape Variations of Audio Envelopes

Various algorithms are available in literature for envelop extraction from signals including homomorphic filtering. While these algorithms are less sensitive to noise-related fluctuations, they frequently extract the low frequency component of the signal rather than render a faithful approximation of the perceptual envelope. In the present invention's approach, the heart sound (within a single heart beat) is modeled through a perceptual envelop.

To extract the audio envelope curve, processor 110 performs noise filtering using wavelet filters to remove the hissing noise that comes from digital stethoscopes. Processor 110 then forms a line segment approximation to the audio signal. This is a standard split-and-merge algorithm for line segment approximation that uses two thresholds, namely, a distance threshold δ and a line length threshold I to recursively partition the audio signal into a set of line segments. Each consecutive pair of line segments then defines a corner feature C_(i). With δ=0.01 and I=5, a faithful rendering of the audio signal is made possible through a line segment approximation while retaining only 10% of the samples. The thresholds for curve parameterization are not as critical here as the shape matching algorithm presented below, which is robust to missing and spurious features. The audio envelope (AE) is defined as the set of points

AE={P_(i)} where P_(i)=C_(j) for some j such that

P _(i)(y)≧P_(i−1)(y) & P _(i)(y)≧P _(i+1)(y) and P _(i)(y)≧baseline   (1)

Only the values of the signal above the baseline are retained in the maxima envelope curve. Similarly, all peaks below the baseline for the minima envelope curve.

Once the envelope curve f(t) is extracted, its shape can be represented by the curvature change points or corners on the envelope curve. The shape information at each corner is captured using the following parameters, wherein these parameters are chosen to facilitate matching of audio envelopes:

S({right arrow over (f)}(t _(i)))=<t _(i) , {right arrow over (f)}(t _(i)),θ(t _(i)),φ(t_(i))>  (2)

where θ(t_(i)) is the included angle in the corner at t_(i), and φ(t_(i)) is the orientation of the bisector at corner t_(i). Using the angle of the corner ensures that wider complexes are not matched to narrow complex as these can change the disease interpretation. The angular bisector, on the other hand, ensures that polarity reversals such as inverted waves can be captured.

Modeling Morphological Shape Variations

Referring to FIG. 6, consider an envelope curve g(t) corresponding to a heart sound. Consider another curve f(t) that is a potential match to g(t), i.e. comes from a different patient diagnosed with the same disease. The curve f(t) is considered perceptually similar to g(t) if a non-rigid transform characterized by [a,b,Γ] can be found such that

|f′(t)−g(t)|≦δ  (3)

where | | represents the distance metric that measures the difference between f′(t) and g(t), the simplest being the Euclidean norm and

f(t)=af(Φ)(t)) with Φ(t)=bt+γ(t)   (4)

where the (bt) is the linear component of the transform and Γ is the non-linear translation component. The parameters a and b are recovered by normalizing in amplitude and time. Normalizing in amplitude is done by transforming f(t) and g(t) such that

$\begin{matrix} \begin{matrix} {{\hat{f}(t)} = {\frac{{f(t)} - {f_{\min}(t)}}{{f_{\max}(t)} - {f_{\min \; f}(t)}}\mspace{14mu} {and}}} \\ {{\hat{g}(t)} = \frac{{g(t)} - {g_{\min}(t)}}{{g_{\max}(t)} - {g_{\min \; f}(t)}}} \end{matrix} & (5) \end{matrix}$

so that a=1. To eliminate solving for b, normalization of the time axis is done by dividing the heart rate. Suppose the sampling rate of points on the curve f(t) is FS. Let a periodicity detection algorithm signal the heart rate period to be T₁. Dividing by the heart period samples, all time instants lie in the range [0,1]. Thus the time normalization can be easily achieved as:

{right arrow over (f)}(t)={circumflex over (f)}(t/T ₁); {right arrow over (g)}(t)=ĝ(t/T ₂)   (6)

where T₁ and T₂ are the heart beat durations of f(t) and g(t) respectively. With this time normalization, b=1. Such amplitude and time normalization automatically makes the shape modeling invariant to amplitude variations in audio recordings, as well as variations in heart rate across patients. Since the non-uniform translation Γ is a function of t, computational overhead is avoided by recovering it at important fiducial points such as the corners, and the overall shape approximation is recovered by interpolation. Let there be K features extracted from {right arrow over (f)}(t) as F_(k)={(t₁,{right arrow over (f)}₁(t₁)),(t₂,{right arrow over (f)}₂(t₂)), . . . (t_(K),{right arrow over (f)}_(K)(t_(K)))} at time {t₁,t₂, . . . ,t_(K)} respectively. Let there be M fiducial points extracted from {right arrow over (g)}(t)as G_(M)={(t′₁,{right arrow over (g)}₁(t′₁)),(t′₂,{right arrow over (g)}₂(t′₂)), . . . (t′_(M),{right arrow over (g)}_(M)(t′_(M)))} at time {t′₁,t′₂, . . . t′_(M)}, respectively. If there is a set of N matching fiducial points C_(Γ)={(t_(i),t′_(j))}, then the non-uniform translation transform Γ can be defined as:

$\begin{matrix} {{\Gamma (t)} = \left\{ \begin{matrix} t_{i} & {{{{if}\mspace{14mu} t} = {t_{j}^{\prime}\mspace{14mu} {and}\mspace{14mu} t_{i}}},{t_{j}^{\prime} \in C_{\Gamma}}} \\ {t_{r} + {\left( \frac{t_{s} - t_{r}}{t_{l}^{\prime} - t_{k}^{\prime}} \right)\left( {t - t_{k}^{\prime}} \right)}} & {{{where}\mspace{14mu} \left( {t_{r},t_{k}^{\prime}} \right)},{\left( {t_{s},t_{l}^{\prime}} \right) \in C_{\Gamma}}} \end{matrix} \right.} & (7) \end{matrix}$

and t′_(k) is the highest of t′_(j)≦t and t′_(l) is the lowest of t′_(j)≧t that have a valid mapping in C_(Γ). Other interpolation methods besides linear (e.g., spline) are also possible.

Using Equations 6 and 7, the shape approximation error between the two curves is then given by:

|f′(t)−g(t)|=|{right arrow over (f)}(Γ(t))−{right arrow over (g)}(t)|  (8)

For each g(t), Γ is selected such that it minimizes the approximation error in (6) while maximizing the size of match C_(Γ). Finding the best matching audio based on shape can then be formulated as finding the g(t) such that

$\begin{matrix} {g_{best} = {\arg {\min\limits_{g}{{{\overset{->}{f}\left( {\Gamma (t)} \right)} - {\overset{->}{g}(t)}}}}}} & (9) \end{matrix}$

while choosing the best Γ for each respective candidate match g(t).

If the feature set F_(K), G_(M) extracted from the respective curves is considered as sequences, the problem of computing the best Γ reduces to finding the best global subsequence alignment using the dynamic programming principle. The best global alignment maximizes the match of the curve fragments while allowing for possible gaps and insertions. Gaps and insertions correspond to signal fragments from feature set F_(K) that don't find a match in set G_(M) and vice versa. In fact, the alignment can be computed using a dynamic programming matrix H where the element H(i,j) is the cost of matching up to the ith and jth element in the respective sequences. As more features find a match, the cost increases as little as possible. The dynamic programming step becomes:

$\begin{matrix} {H_{i,j} = {\min \left\{ \begin{matrix} {H_{{i - 1},{j - 1}} + {d\left( {{\overset{->}{f}\left( t_{i} \right)},{\overset{->}{g}\left( t_{j}^{\prime} \right)}} \right)}} \\ {H_{{i - 1},j} + {d\left( {{\overset{->}{f}\left( t_{i} \right)},0} \right)}} \\ {H_{i,{j - 1}} + {d\left( {0,{\overset{->}{g}\left( t_{j}^{\prime} \right)}} \right)}} \end{matrix} \right.}} & (10) \end{matrix}$

with initialization as H_(o,o)=0 and H_(o,j)=∞ and H_(i,0)=∞ for all 0<i≦K, and 0<j≦M. Here d(.) is the cost of matching the individual features described next. Also, the first term represents the cost of matching the feature point {right arrow over (f)}(t_(i)) to feature point {right arrow over (g)}(t′_(j)) which is low if the features are similar. The second term represents the choice where no match is assigned to feature {right arrow over (f)}(t_(i)).

Shape Similarity of Envelop Curves

After the transformation is recovered, the similarity between two envelop curves is given by the cost function d({right arrow over (f)}(t_(i)),{right arrow over (g)}(t′_(j))):

$\begin{matrix} {{d\left( {{\overset{->}{f}\left( t_{i} \right)},{\overset{->}{g}\left( t_{j}^{\prime} \right)}} \right)} = \left\{ \begin{matrix} \sqrt{\left( {t_{i} - t_{j}^{\prime}} \right)^{2} + \left( {{\overset{->}{f}\left( t_{i} \right)} - {\overset{->}{g}\left( t_{j}^{\prime} \right)}} \right)^{2} + \left( {{\theta \left( t_{i} \right)} - {\theta \left( t_{j}^{\prime} \right)}} \right)^{2} + \left( {{\phi \left( t_{i} \right)} - {\phi \left( t_{j}^{\prime} \right)}} \right)^{2}} & {{{if}\mspace{14mu} {{t_{i} - t_{j}^{\prime}}}} \leq {{and}\mspace{14mu} \left( {{\overset{->}{f}\left( t_{i} \right)} - {\overset{->}{g}\left( t_{j}^{\prime} \right)}} \right)^{2}} \leq {\lambda_{2}{{{\theta \left( t_{i} \right)} - {\theta \left( t_{j}^{\prime} \right)}}}} \leq {\lambda_{3}{{{\phi \left( t_{i} \right)} - {\phi \left( t_{j}^{\prime} \right)}}}} \leq \lambda_{4}} \\ \infty & {otherwise} \end{matrix} \right.} & (11) \end{matrix}$

The thresholds (λ₁, λ₂, λ₃, λ₄) are determined through a prior learning phase in which the expected variations per disease class is noted. The cost function d({right arrow over (f)}(t_(i)),0) can be computed by substituting t′_(j)=0, {right arrow over (g)}(t′_(j))=0 and θ(t+_(j))=0, φ(t′_(j))=0 in Equation 11. The cost function d(0,{right arrow over (g)}(t′_(j))) can be similarly computed.

Thus using the present invention's approach, two heart sounds are considered similar if enough number of fiducial points between query and target envelop curves can be matched using shape-based dynamic time warping. In general, due to the period estimation offset errors, the signals may have to be circularly shifted by a fixed translation for a rough alignment before the fine non-rigid alignment described above.

Additionally, the present invention provides for an article of manufacture comprising computer readable program code contained within implementing one or more modules to detect audio similarity of sounds. Furthermore, the present invention includes a computer program code-based product, which is a storage medium having program code stored therein which can be used to instruct a computer to perform any of the methods associated with the present invention. The computer storage medium includes any of, but is not limited to, the following: CD-ROM, DVD, magnetic tape, optical disc, hard drive, floppy disk, ferroelectric memory, flash memory, ferromagnetic memory, optical storage, charge coupled devices, magnetic or optical cards, smart cards, EEPROM, EPROM, RAM, ROM, DRAM, SRAM, SDRAM, or any other appropriate static or dynamic memory or data storage devices.

Implemented in computer program code based products are software modules for recording a sound, selecting a time duration of a portion of the first sound to use as a second sound, constructing a line segment approximation of the second sound, defining an audio envelope around the second sound using the line segment approximation, isolating a plurality of fiducial points on the audio envelope, and matching the second sound to a plurality of similar sounds, wherein the matching comprises: identifying a non-rigid alignment transform based on a shape-based dynamic time warping to determine a correspondence between the fiducial points of the audio envelope and fiducial points from a database of sounds, defining a measure of shape similarity by determining a ratio of matched fiducial points to a total number of fiducial points, and ranking the matching based on the ratio.

Conclusion

A system and method has been shown in the above embodiments for the effective implementation of a method and apparatus for retrieval of sounds from a database. While various preferred embodiments have been shown and described, it will be understood that there is no intent to limit the invention by such disclosure, but rather, it is intended to cover all modifications falling within the spirit and scope of the invention, as defined in the appended claims. For example, the present invention should not be limited by the type of audio.

The above enhancements are implemented in various computing environments. For example, the present invention may be implemented on a conventional IBM PC or equivalent, multi-nodal system (e.g., LAN) or networking system (e.g., Internet, WWW, wireless web). All programming and data related thereto are stored in computer memory, static or dynamic, and may be retrieved by the user in any of: conventional computer storage, display (i.e., CRT) and/or hardcopy (i.e., printed) formats. The programming of the present invention may be implemented by one of skill in the art of audio analysis. 

1. A computer-implemented method for detecting audio similarity of heart sounds comprising: recording a first heart sound; pre-processing the first heart sound to create a second heart sound; constructing a line segment approximation of the second heart sound; defining an audio envelope around the second heart sound using the line segment approximation; isolating a plurality of fiducial points on the audio envelope; and matching the second heart sound to a plurality of similar heart sounds.
 2. The method of claim 1, wherein the step of pre-processing comprises identifying a single periodic segment.
 3. The method of claim 1, wherein the step of pre-processing the first heart sound comprises: forming a curve f(t) by shifting g(t) by a candidate period τ; and defining a function R(τ) that satisfies the periodicity condition based on the candidate period τ as ${R(\tau)} = {{\frac{\left\{ {g\left( t_{i} \right)} \right\} }{\max \left\{ {{N - \tau},\tau} \right\}}\mspace{14mu} {such}\mspace{14mu} {that}\mspace{14mu} {{{g\left( t_{i} \right)} - {f\left( t_{j} \right)}}}} \leq ɛ}$ wherein N is the total number of points on curve g(t), t_(i) and t_(j) are time values of matching fiducial points, and ε is an error tolerance.
 4. The method of claim 1, wherein the matching comprises: identifying a non-rigid alignment transform based on a shape-based dynamic time warping to determine a correspondence between the fiducial points of the audio envelope and fiducial points from a database of heart sounds; defining a measure of shape similarity as a function of a number matched fiducial points and a total number of fiducial points; and outputting the measure of shape similarity.
 5. The method of claim 4, wherein the function comprises a ratio of the number of matched fiducial points to the total number of fiducial points.
 6. The method of claim 5, wherein the matching further comprises ranking the matching based on the ratio.
 7. The method of claim 4, wherein the database of heart sounds comprises a database of heart sounds associated with diseases.
 8. The method of claim 7, wherein the step of outputting the measure of shape similarity further comprises: assembling disease statistics from the database of heart sounds; and presenting the disease statistics as a decision support system showing disease labels that are associated with a plurality of heart sounds from the database of heart sounds.
 9. The method of claim 4, wherein the matching further comprises time scaling and amplitude scaling.
 10. The method of claim 1, wherein the fiducial points comprise local extremes of the audio envelope only only one side of a baseline of the second heart sound.
 11. A computer program product comprising: a computer usable medium having computer usable program code for detecting audio similarity of heart sounds, said computer program product including: computer usable program code for recording a first heart sound; computer usable program code for pre-processing the first heart sound, wherein the computer usable program code for pre-processing comprises: program code for selecting a time duration of a portion of the first heart sound to use as a second heart sound; computer usable program code for constructing a line segment approximation of the second heart sound; computer usable program code for defining an audio envelope around the second heart sound using the line segment approximation; computer usable program code for isolating a plurality of fiducial points on the audio envelope; and computer usable program code for matching the second heart sound to a plurality of similar heart sounds, wherein the computer usable program code matching comprises: program code for identifying a non-rigid alignment transform based on a shape-based dynamic time warping to determine a correspondence between the fiducial points of the audio envelope and fiducial points from a database of heart sounds; program code for defining a measure of shape similarity by determining a ratio of matched fiducial points to a total number of fiducial points; program code for ranking the matching based on the ratio.
 12. A system for detecting audio similarity of heart sounds comprising: an audio transducer for recording a first heart sound; a processor; wherein the processor pre-processes the first heart sound to create a second heart sound, constructs a line segment approximation of the second heart sound, defines an audio envelope around the second heart sound using the line segment approximation, and isolates a plurality of fiducial points on the audio envelope; an input/output interface in communication with a database of heart sounds; wherein the processor matches the second heart sound to a plurality of similar heart sounds retrieved from the database through the input-output interface.
 13. The system of claim 12, wherein the processor pre-processes the first heart sound by: forming a curve f(t) by shifting g(t) by a candidate period τ; and defining a function R(τ) that satisfies the periodicity condition based on the candidate period τ as ${R(\tau)} = {{\frac{\left\{ {g\left( t_{i} \right)} \right\} }{\max \left\{ {{N - \tau},\tau} \right\}}\mspace{14mu} {such}\mspace{14mu} {that}\mspace{14mu} {{{g\left( t_{i} \right)} - {f\left( t_{j} \right)}}}} \leq ɛ}$ wherein N is the total number of points on curve g(t), t_(i) and t_(j) are time values of matching fiducial points, and ε is an error tolerance.
 14. The system of claim 12, wherein the processor matches the second heart sound to a plurality of heart sounds by: identifying a non-rigid alignment transform based on a shape-based dynamic time warping to determine a correspondence between the fiducial points of the audio envelope and fiducial points from a database of heart sounds; defining a measure of shape similarity as a function of a number matched fiducial points and a total number of fiducial points; and outputting the measure of shape similarity.
 15. The system of claim 14, wherein the function comprises a ratio of the number of matched fiducial points to the total number of fiducial points.
 16. The system of claim 15, wherein the processor ranks the matching based on the ratio.
 17. The system of claim 14, wherein the database of heart sounds comprises a database of heart sounds associated with diseases.
 18. The system of claim 17, wherein outputting the measure of shape similarity further comprises: assembling disease statistics from the database of heart sounds; and presenting the disease statistics as a decision support system showing disease labels that are associated with a plurality of heart sounds from the database of heart sounds.
 19. The system of claim 14, wherein the processor adjusts the second heart sound by time scaling and amplitude scaling.
 20. The system of claim 12, wherein the fiducial points comprise local extremes of the audio envelope only on one side of a baseline of the second heart sound. 